3. Data Consistency and the "Theory of relativity"

"If a tree falls in a forest and no one is around to hear it, does it make a sound?"
If a system has an inconsistency but no one is able to observe it, is it still an inconsistency?



Introduction

We need to scale data processing systems geographically to achieve lower Latency and (at least partial) Availability in case of network Partitioning. But CAP/PACELC tells us that we cannot achieve strong Consistency in this case. When we increase the Consistency requirements, we have to accept lower Availability and higher Latency.



What is the minimum consistency level that we need?

If eventual consistency is enough for your system, things are pretty clear and relatively simple. Most probably you want to achieve Strong Eventual Consistency that is relatively cheap and provides nice guaranties. For this you will have to you something like CRDT. Some theoretical results assure us that you cannot find something way cleverer than CRDT that achieves Strong Eventual Consistency.

Often, you have business flows where eventual consistency is just not enough. For example, after you update a data in one region, you expect a following read in another geographical node to return the updated value, like in "Read-Your-Writes Consistency". This sounds like a pretty simple requirement, but it is really hard to implement correctly and it will increase the Latency of the system and/or decrease Availability - compared with eventual consistency.


The Consistency that you cannot have


There is nothing like a "global clock" in a distributed system. Like in the "Theory of Relativity" the order of events in a distributed system is relative to the observer. Two observers can perceive the same two evens in a different order when they are very close in time and/or they happen at great distance from each other.

The observers in the "Theory of Relativity" are limited by the speed of light to get knowledge about any events that are happening remotely. The higher the spatial distance between events, the higher is the time skew that can affect an observer. Even when the physical distance is small, when the events are happening closely in time, they can be observed in a different order depending on the position of the observer.


The order that you cannot have

In general, you cannot determine the order of 2 events A and B if the "time interval" between them is smaller than the time required for the light to travel between their locations. Time interval is in this case an ambiguous notion but intuitively useful. We can use the time interval as observed by an observer that is equally distanced from the 2.

When the "time interval" is smaller than the time needed for the light to travel between A and B, an observer close to one event will perceive the other event happening after the event that he is close to.

An observer between A and B might observe the two events as simultaneously. An observer close to A will perceive event B happening after A. An observer close to B will perceive event A happening after B. Observers see the order of events relatively to their location. This is because information cannot travel faster than light and the information from the closer event reaches the observer faster.

For example, we see the past of the distant stars, and if there would be an observer from a very distant star, it might see the Earth in the dinosaur's era - if they would have a powerful telescope.


Order in distributed systems

Distributed systems are limited to the same speed of light in transmitting information. It takes even more time to propagate information between distant network nodes, because there is not a single optical fiber between any two nodes. Switches and routers will add extra latency. Because of this, sometimes the same distance shows higher latency on land compared to ocean - because on land you have more many routers.

Even if communications would happen at the speed of lights, the information would still take tens of milliseconds to propagate between continents. The higher the distance, the higher the Latency impact of assuring Consistency. This is why it's easy to assure strong Consistency in a single server or even in a server cluster in close datacenters, but you cannot practically implement ACID in a cross-continent database.


Enforcing an order

As the order for the events that are close in time depends on the observer, the only way to observe events order Coherently is to force distant observers to not observe events directly. A simple way to do this is to have an "official observer" that will provide the consistent view on events to all other observers. 

For strong consistency we always need a "serialization point" or "locking point" - as it is used in many transactional systems. More efficient systems exist that can move the "serialization point" to optimize latency, however you cannot observe events order consistently without choosing a privileged observer - often named "the source of truth".

Consistency systems based on consensus are using... a quorum of observers instead of a single observer. This can make the system more resilient in case that one of the observers becomes unavailable. Still, we need to pay the cost to reach the same observer from both observing sides (for example the write and the subsequent read).

You just cannot have read-your-writes consistency without paying the full latency between the read/write nodes. Usually the paid latency is higher, as the sum of latencies to observer can be greater than the latency between the read/write nodes.

For systems based on consistency you need to pay the highest latency to the closest N/2+1 nodes - for example write on N/2 nodes and read on N/2+1 nodes.




The Consistency that you don't need

Definition: A Consistent data system is a system that don't have inconsistencies

The above looks like a silly tautology, but it shows a different point of view.

Anyone would prefer Strong Consistency like "global serializability" if it would be cheap. But "global serializability" is like relying on a single global observer for a distributed system, so it severely affects Latency and Availability, and even scalability.

The above definition asks you to think about the kind of inconsistencies that you care about. If you modify data for one user, is it required for another user to see that change immediately or it's enough if the change propagates in couple of seconds? If the users don't need to see each other (strongly) consistent, you can keep each user's data close to the usage (to ensure cheap synchronization) and replicate data asynchronously between regions.

You always want asynchronous replication over long distances because synchronous replication would block each write with the longest geographical latency and block all writes if one region is not available.

Thinking which business flows need to be consistent and which can work acceptable with eventual consistency is an analysis that allows you to chose the best compromise between Consistency and Resilience/Latency.


The Consistency that you cannot observe

If you have distributed actors doing operations concurrently, there is a time slice when you cannot determine what action happened first. Let's suppose you have "synchronized" clocks (example ntp). You see in the logs that the write was issues 1 millisecond before the read. This doesn't mean that the read was issued after the write. It might be that the read is made in a region with a clock that is +2 millisecond off.

In such small intervals, you cannot determine what event actually happened first. Therefore, you cannot determine if a consistent system should output the old value or the new value. If you cannot know the expected result, you cannot tell if the system is strongly consistent or just eventual consistent.

The only consistency the you can observe is the one that involves a serialization point. In order for an observer to assess the consistency of the system, we need to be able to determine an objective order of requests. And we know that there is always a slight time window where the order is not properly determined - typically the propagation latency between the nodes.

There is also the time while the write is running. In that period, you cannot determine objectively if a consistent system should return the old value or the new value.

You can only evaluate read-after-write consistency when the reader starts after it receives the information about the finish of the write request. This is the same with having a synchronization/serialization point between the write and the read. In the above case, the serialization point is located with the reader.

We observe that we cannot avoid paying the geographical latency at the write time or at the read time. We can avoid this cost if we are sure that the read does not need to be consistent with a very recent write. When we are in a business flow, often we only need to be consistent with the previous write in that flow. The business flow might tolerate to receive an even more recent write, as long as the data is newer that the write triggered by the flow. When this is the case, we can design distributed systems that are consistent enough and have good Latency and Availability.

What is important is to not try to assure a higher consistency level than users can observe or care about. Each additional consistency guaranties will be payed as lower Latency and lower Availability.


How can you observe an inconsistency?

A better way to think about consistency is to ask yourself "how can a user of the system notice an inconsistency?"

Think about the situation above. If an user data is updated and asynchronously replicated, how can a second user observe an inconsistency? First, a second user needs to read in the second that takes the data to replicate everywhere. Or it can happen because the current node is disconnected from the writing node. These are corner cases; in many cases it is acceptable to read stale data if this happens rarely.

But how can the second user KNOW that there was a data inconsistency in the system? Well, the second user needs to get notified on a communication channel that the first write was completed. Even in this case, the second user cannot assess the Consistency in the short time before the write is finished. Also, in the period when the write information travels, there no objective way for the second user to know that a read should return the new value and not the old one.

We can imagine that the first user triggers an update on its data, then call the second user to tell him about the update. It is likely that the time to do the call is higher than the propagation time, so the second user will not see the outdated value, even with asynchronous replication. In these cases, it is very likely that different users cannot observe inconsistency. And when the users cannot observe the inconsistency, they should not be affected by it.

If the two users synchronize themselves in the same room to start the read immediately after the write, the second might see the outdated value. But what is the harm or what is the gain in doing this? There are very special cases when such inconsistencies matter, for example if two users can concurrently consume the same voucher. But except for very special cases that you can detect, you can settle for eventual consistency or flow consistency.


Consistency on invariants

Moving money from one account to another is a case of enforcing invariants. This doesn't work with eventual consistency. You don't want to concurrently check there is enough balance then concurrently debit more than the existing money. Because of the relativity of observing you need to use a serialization point to enforce the debit. However, you are free to propagate the money asynchronously. In order to observe the "inconsistency", the first user need to call the destination and the destination must check his balance. This really happens, sometimes the money is not there yet. But... it's business acceptable to have some delay and people accept this.

You can further relax the business rules and allow the card balance to get negative (like "credit card"). You can then operate the writes asynchronously. The bank should assume the risk that the sum of debits might rarely go over the credit limit, but you buy great Availability and resiliency for this price.

The voucher consumption is also a case of invariant: a voucher should be consumed by maximum 1 user. This is usually handled by a lock in a serialization point. Because of propagation delay, there is no way to settle this transaction for multiple users without paying the maximum latency between attempting users. Even if you you reserve the voucher for all users then have a way to choose the one who wins - and rollback the other, you still need to wait the time for all attempts to propagate to everyone. Because concurrent requests happen rarely, if is often cheaper to lock the voucher in a single place that is hopefully in the same geographical region with the user.


Flow consistency

Except for invariants that need to be handles in a serialization point, most cases that involves consistency mainly requires to not read stale data in a distributed read-after-write flow. As seen above, two observers (users) cannot observe read-after-write inconsistencies in a system using fast asynchronous replication except when they are involved in a special flow of requests when the reader is informed about the write completion.

You can only verify consistency after the write finishes. If two users stay in the same room, first starts a write and immediately the other does a read in another region, there is no way to verify if the system is strongly consistent or (fast) eventual consistent. If the read returns stale data, it might be that the system is strongly consistent, but the write request was buffered on the network and the read request propagated faster, reaching the system before the write.

So, you can only observe stale data in a tight flow of requests. You care about read-after-write consistency when you have a flow where after a completed write there is a subsequent read for the affected data. Usually, each geographical node can be kept strongly consistent, inconsistencies appears usually when we spread requests among multiple geographical regions.

For the flows where you really need Flow consistency there are several approaches. Probably the most efficient way is to carry some consistency information with the flow - see links at the bottom.


Bottom line

Similar to the "Theory of relativity", changes in a distributed system cannot be always ordered objectively in time when happened in a very short interval. It makes little sense to require a Consistency level where all events have a global order - and this is prohibitively costly in Latency and Availability. 

For many cases that need more than eventual consistency, it is usually enough to enforce read-your-writes consistency separately on each separate flow of request. In most cases, users that are not involved in a specific business flow will not be able to observe inconsistencies with fast eventual consistency or will not care much about it.

It only makes sense to enforce consistency when the order of events can be determined - for example a business flow finishes a write then expects to the read affected data. For such cases we can still design scalable distributed systems that have good Latency and Availability.


Previous work:

2. Flow consistency - read-your-writes consistency



Dear reader, please leave a message if you exist! ;) Also, please share this article if you find it interesting. Thank you.

Comments